A Semantics-Based Clustering Approach for Online Laboratories Using K-Means and HAC Algorithms

نویسندگان

چکیده

Due to the availability of a vast amount unstructured data in various forms (e.g., web, social networks, etc.), clustering text documents has become increasingly important. Traditional algorithms have not been able solve this problem because semantic relationships between words could accurately represent meaning documents. Thus, document extensively utilized enhance quality clustering. This method is called unsupervised learning and it involves grouping based on their meaning, common keywords. paper introduces new that groups from online laboratory repositories similarity approach. In work, dataset collected first by crawling short real-time descriptions laboratories’ Web. A vector space created using frequency-inverse frequency (TF-IDF) done K-Means Hierarchical Agglomerative Clustering (HAC) with different linkages. Three scenarios are considered: without preprocessing (WoPP); steaming (PPwS); (PPWoS). Several metrics used for evaluating experiments: Silhouette average, purity, V-measure, F1-measure, accuracy score, homogeneity completeness NMI score (consisting five datasets: labs, 20 NewsGroups, Txt_sentoken, NLTK_Brown NLTK_Reuters). Finally, creating an interactive webpage, results proposed work contrasted visualized.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Implementation of K-Means and HAC Algorithm and Its Comparison with other Clustering Algorithms

There is a huge amount of data which is being produced everyday in Information Technology industry but it is of no use until converted into useful information. Data mining is defined as the process of extracting of hidden predictive information from large databases. Data mining provides an easy and timesaving concept to extract the useful information from large database instead of going through...

متن کامل

Extraction based approach for text summarization using k-means clustering

This paper describes an algorithm that incorporates kmeans clustering, term-frequency inverse-document-frequency and tokenization to perform extraction based text summarization.

متن کامل

A study of K-Means-based algorithms for constrained clustering

The problem of clustering with constraints has received considerable attention in the last decade. Indeed, several algorithms have been proposed, but only a few studies have (partially) compared their performances. In this work, three well-known algorithms for k-means-based clustering with soft constraints — Constrained Vector Quantization Error (CVQE), its variant named LCVQE, and the Metric P...

متن کامل

MLK-Means - A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering

Document clustering is useful in many information retrieval tasks such as document browsing, organization and viewing of retrieval results. They are very much and currently the subject of significant global research. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. In this work, address a new hybrid algorithm call...

متن کامل

An Algorithm for Online K-Means Clustering

This paper shows that one can be competitive with the kmeans objective while operating online. In this model, the algorithm receives vectors v1, . . . , vn one by one in an arbitrary order. For each vector vt the algorithm outputs a cluster identifier before receiving vt+1. Our online algorithm generates Õ(k) clusters whose k-means cost is Õ(W ∗) where W ∗ is the optimal k-means cost using k cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2023

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math11030548